Bootstrap Confidence Intervals

The bootstrap can be used to find an approximate sampling distribution for a given estimator/statistic — the bootstrap distribution. Quantiles of this distribution provide an easy way to generate a confidence interval for the statistic.

The Mean

Consider the sample mean, \(\hat{\mu} = \sum_{i=1}^{n} X_i / n = \bar{X}\). We can use the boot package to find the bootstrap distribution. We do so for the Weight data.

  htwt <- read.csv("http://facweb1.redlands.edu/fac/jim_bentley/data/Math%20312/regression/htwt.csv")
  head(htwt)
##   Height Weight Group
## 1     64    159     1
## 2     63    155     2
## 3     67    157     2
## 4     60    125     1
## 5     52    103     2
## 6     58    122     2
  ggplot(htwt, aes(x=Weight)) + geom_histogram(binwidth = 10)

  ggplot(htwt, aes(x=Weight)) + geom_dotplot() + ylab("Proportion")
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.

  b.mean <- function(d, i){mean(d[i])}
  wt.mean.boot <- boot(htwt$Weight, b.mean, R=9999)
  wt.mean.boot
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = htwt$Weight, statistic = b.mean, R = 9999)
## 
## 
## Bootstrap Statistics :
##     original    bias    std. error
## t1*    139.6 -0.100375    9.392069
  plot(wt.mean.boot)

While the raw data appear to be non-normally distributed — abnormal? — the distribution of means seems to be normal. Apparently the Central Limit Theorem has kicked in. Note that the estimator appears to be unbiased. The standard error is estimated to be 9.3920693. This is close to \(s/\sqrt{n}=\) 9.6423954. All of this is interesting and useful — later.

To get the bootstrap confidence interval, we use a quantile approach. We can trap a proportion of ``plausible’’ values between two values found by trimming the required percentages off of each end.

  quantile(wt.mean.boot$t, c(0.005, 0.025, 0.05, 0.95, 0.975, 0.995))
##     0.5%     2.5%       5%      95%    97.5%    99.5% 
## 117.2485 122.0000 124.3000 155.2500 158.6500 164.5500

We see that a 95% CI for the mean is 122 to 158.65 and a 99% CI for the mean is 117.2485 to 164.55.

The confidence intervals from above, and in particular the Percentile method, are similar to those given by the boot.ci function below.

  boot.ci(wt.mean.boot)
## Warning in boot.ci(wt.mean.boot): bootstrap variances needed for studentized
## intervals
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 9999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = wt.mean.boot)
## 
## Intervals : 
## Level      Normal              Basic         
## 95%   (121.3, 158.1 )   (120.5, 157.2 )  
## 
## Level     Percentile            BCa          
## 95%   (122.0, 158.7 )   (122.9, 159.9 )  
## Calculations and Intervals on Original Scale

The Normal CI given above is calculated as \(\hat{\mu} \pm z_{(1-c)/2} \mbox{se}(\hat{\mu})\) or \(\hat{\mu} \pm z_{\alpha/2} \mbox{se}(\hat{\mu})\) where \(Z \sim \mbox{N}(0,1)\), \(c = 1-\alpha\) is the confidence level, and \(\mbox{se}(\hat{mu})\) is the bootstrap standard error.

The Median

Consider the sample median, \(m\), chosen such that \(P(X \le m)=P(X \ge m)=\frac{1}{2}\). We can use the boot package to find the bootstrap distribution of the sample median. We do so using the Weight data from above.

  b.median <- function(d, i){median(d[i])}
  wt.median.boot <- boot(htwt$Weight, b.median, R=9999)
  wt.median.boot
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = htwt$Weight, statistic = b.median, R = 9999)
## 
## 
## Bootstrap Statistics :
##     original   bias    std. error
## t1*    123.5 8.171267    15.94761
  plot(wt.median.boot)

While the distribution of the bootstrap means seems to be normal, the distribution of the bootstrap medians does not. Since the Central Limit Theorem is a statement about the asymptotic distribution of sums of random variables (recalling the proof and the use of \(M_{\sum X_i}(t)\)), this should not be surprising. Note that the estimator appears to be biased. The standard error is estimated to be 15.9476114.

Because of the granularity of the bootstrap distribution we might be uncomfortable computing the bootstrap confidence interval for the median.

  quantile(wt.median.boot$t, c(0.005, 0.025, 0.05, 0.95, 0.975, 0.995))
##    0.5%    2.5%      5%     95%   97.5%   99.5% 
## 106.500 111.000 112.000 157.000 158.000 174.505

One suggested method for dealing with the granularity is to add a little random noise to smooth things out. Some authors suggest using \(\varepsilon \sim \mbox{N}\left(0, 1/n \right)\)

  hist(wt.median.boot$t+rnorm(9999, 0, 1/sqrt(nrow(htwt))))

  qqnorm(wt.median.boot$t+rnorm(9999, 0, 1/sqrt(nrow(htwt))))

  quantile(wt.median.boot$t+rnorm(9999, 0, 1/sqrt(nrow(htwt))), c(0.005, 0.025, 0.05, 0.95, 0.975, 0.995))
##     0.5%     2.5%       5%      95%    97.5%    99.5% 
## 106.2672 110.9073 112.0770 156.9550 158.1348 174.8073

The confidence intervals computed above can be compared to those generated by boot.ci.

  boot.ci(wt.median.boot)
## Warning in boot.ci(wt.median.boot): bootstrap variances needed for studentized
## intervals
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 9999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = wt.median.boot)
## 
## Intervals : 
## Level      Normal              Basic         
## 95%   ( 84.1, 146.6 )   ( 89.0, 136.0 )  
## 
## Level     Percentile            BCa          
## 95%   (111, 158 )   (110, 157 )  
## Calculations and Intervals on Original Scale

With an apparent lack of normality, it would be a waste of time to use a normal approach to computing a confidence interval.

Difference of Means

We can use the boot package to look at more interesting statistics. Consider creating a confidence interval for the difference of the means of two samples. As an example, we can look at the difference of Group Weights using the data in the htwt data frame.

  wtgrp <- htwt[,c("Weight", "Group")]
  meanDiff <- function(x, i){
                             ### Compute group means
                             y <- tapply(x[i,1], x[i,2], mean)
                             ### Return the difference
                             y[1]-y[2]
                            }
   wtgrp.meanDiff.boot <- boot(wtgrp, meanDiff, R=9999)
   wtgrp.meanDiff.boot
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = wtgrp, statistic = meanDiff, R = 9999)
## 
## 
## Bootstrap Statistics :
##     original      bias    std. error
## t1*       28 -0.06787881    18.90771
   plot(wtgrp.meanDiff.boot)

   boot.ci(wtgrp.meanDiff.boot)
## Warning in boot.ci(wtgrp.meanDiff.boot): bootstrap variances needed for
## studentized intervals
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 9999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = wtgrp.meanDiff.boot)
## 
## Intervals : 
## Level      Normal              Basic         
## 95%   (-8.99, 65.13 )   (-8.22, 65.33 )  
## 
## Level     Percentile            BCa          
## 95%   (-9.33, 64.22 )   (-9.69, 63.85 )  
## Calculations and Intervals on Original Scale

Normal and t Confidence Intervals

The bootstrap distribution of a statistic/estimator is supposed to resemble the sampling distribution of the same statistic/estimator. So, the distribution of the mean found above should be representative of what we would see if we sampled all possible samples of size \(n\) from the population.

We note that 95% of the sample means are within

  quantile(wt.mean.boot$t, c(0.025, 0.975))
##   2.5%  97.5% 
## 122.00 158.65

Assuming normality (which is supported by the plots generated above), the CLT suggests that the sample mean is distributed normally with mean, \(\mu = 139.6\) and standard deviation \(s_{\bar{X}} = 43.1221/\sqrt{20} =\) 9.6423947. A plot of the normal (red/dot) and t (blue/dash) distributions for the weight data shows their differences.

  (n <- length(htwt$Weight))
## [1] 20
  (mu <- mean(htwt$Weight))
## [1] 139.6
  (se <- sqrt(var(htwt$Weight)/n))
## [1] 9.642395
  xbar <- mu + seq(-5, 5, by=0.01)*se
  fz <- dnorm((xbar-mu)/se)
  ft <- dt((xbar-mu)/se, n-1)
  plot(xbar, fz, type="l", lty=3, col="red", ylim=range(c(fz, ft)))
  lines(xbar, ft, lty=2, col="blue")
  abline(v=qnorm(c(0.025,0.975), mu, se), lty=3, col="red")
  abline(v=mu + qt(c(0.025,0.975), n-1)*se, lty=2, col="blue")
  abline(h=0)

Confidence intervals based upon the normal and t distributions also demonstrates the differences.

  qnorm(c(0.025, 0.975), mu, se)
## [1] 120.7013 158.4987
  mu+qt(c(0.025, 0.975), n-1)*se
## [1] 119.4182 159.7818

If we look at the bootstrap means, we see that

  (within196se <- table(wt.mean.boot$t >= qnorm(0.025, mu, se) & wt.mean.boot$t <= qnorm(0.975, mu, se))/length(wt.mean.boot$t)*100)
## 
##    FALSE     TRUE 
##  4.30043 95.69957

So, 95.7% of the means are within 1.96 standard errors of \(\mu\). We can turn this inside out and conclude that \(\mu\) is within 1.96 standard errors of 95.7% of the sample means. Hence, if we use \(\bar{X} \pm 1.96 \cdot \mbox{se}\left(\bar X\right)\) to create our intervals, 95% of the intervals will contain \(\mu\).

The plots below shows the proportion of means that fall within and outside of 95% and 99% CIs.

  nreps <- 100
  Sample <- 1:nreps
  xbar <- sample(wt.mean.boot$t, nreps)
  mu <- mean(wt.mean.boot$t)
  l95 <- xbar - 1.96 * se
  u95 <- xbar + 1.96 * se
  l99 <- xbar - 2.576 * se
  u99 <- xbar + 2.576 * se
  covers95 <- l95 <= mu & mu <= u95
  covers99 <- l99 <= mu & mu <= u99
  df <- data.frame(Sample, l99, l95, xbar, u95, u99)
  
  p <- ggplot(df, aes(x=xbar, y=Sample)) + geom_point() + geom_vline(xintercept = mu) + xlim(range(c(l99,u99)))
  p + geom_vline(xintercept = mu + c(-2.576, -1.96, 1.96, 2.576)*se, lty=2)

  p + geom_segment(aes(x = l99, y = Sample, xend = u99, yend = Sample, color = covers99)) 

  p + geom_segment(aes(x = l95, y = Sample, xend = u95, yend = Sample, color = covers95))

To formalize this approach we note that above we used the bootstrap to generate percentile confidence intervals. We also used the bootstrap standard error to create normal confidence intervals for those bootstrap distributions that appeared to be normal.

It turns out that because of the CLT, when we know the population standard deviation, \(\sigma_X\), we can still use the normal approximation for the mean and sum. In this case, the standard error of the mean is \(\sigma_{\bar{X}} = \sigma_x / \sqrt{n}\). When the population standard deviation is not known, we can approximate it by using \(s_{\bar{X}} = s_x / \sqrt{n}\). When we make this substitution, we also substitute a t-distribution (on \(n-1\) degrees of freedom) for the normal.

  n <- length(htwt$Weight)
  s.xbar <- sqrt(var(htwt$Weight)/n)
  s.xbar
## [1] 9.642395
  s.boot <- sqrt(var(wt.mean.boot$t))
  s.boot
##          [,1]
## [1,] 9.392069
  mean(wt.mean.boot$t)
## [1] 139.4996
  wt.mean.boot
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = htwt$Weight, statistic = b.mean, R = 9999)
## 
## 
## Bootstrap Statistics :
##     original    bias    std. error
## t1*    139.6 -0.100375    9.392069
  ### Use the bootstrap std error
  mean(htwt$Weight) + qnorm(c(0.005, 0.025, 0.05, 0.95, 0.975, 0.995))*s.boot
## Warning in qnorm(c(0.005, 0.025, 0.05, 0.95, 0.975, 0.995)) * s.boot: Recycling array of length 1 in vector-array arithmetic is deprecated.
##   Use c() or as.vector() instead.
## [1] 115.4076 121.1919 124.1514 155.0486 158.0081 163.7924
  ### Pretend that the sample std dev is the pop std dev
  mean(htwt$Weight) + qnorm(c(0.005, 0.025, 0.05, 0.95, 0.975, 0.995))*s.xbar
## [1] 114.7628 120.7013 123.7397 155.4603 158.4987 164.4372
  ### Compute the CI using the sample std dev
  mean(htwt$Weight) + qt(c(0.005, 0.025, 0.05, 0.95, 0.975, 0.995), n-1)*s.xbar
## [1] 112.0137 119.4182 122.9270 156.2730 159.7818 167.1863
  boot.ci(wt.mean.boot)
## Warning in boot.ci(wt.mean.boot): bootstrap variances needed for studentized
## intervals
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 9999 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = wt.mean.boot)
## 
## Intervals : 
## Level      Normal              Basic         
## 95%   (121.3, 158.1 )   (120.5, 157.2 )  
## 
## Level     Percentile            BCa          
## 95%   (122.0, 158.7 )   (122.9, 159.9 )  
## Calculations and Intervals on Original Scale

We see that the confidence intervals are slightly different. However, in this case, the differences are practically minimal — one or two pounds when looking at a mean of 139.6 pounds.

Theoretical CIs

The old-school, theoretical approach to creating confidence intervals is based upon the CDF and its inverse. In what follows, remember that the confidence level \(c\) is related to the hypothesis error type I error rate \(\alpha\) by \(c=1-\alpha\).

Proportion

For the proportion, we note that each of the outcomes can be coded as a “success” or “failure”. This leads us to the use of \(X_i\) iid \(B(1,p)\) — or Bernoulli trials. The total number of “successes” is \(\sum_{i=1}^n X_i\) and \(\hat{p} = \bar{X}\). Further, \(E\left(\bar{X}\right) = p\) and \(Var\left(\bar{X}\right) = p(1-p)/n\).

By the CLT, for \(n\) “large” (variously \(np \ge 10\) and \(n(1-p) \ge 10\), or \(np(1-p) \ge 5\), etc.), \(\bar{X} \sim N\left(p, p(1-p)/n\right)\). To obtain a \(c = 1-\alpha\) level CI for \(p\), we approximate the variance using \(\widehat{\sigma^2} = \sqrt{\hat{p}(1-\hat{p})/n}\) and look at \[ \begin{align*} 1-\alpha & = P\left(z_{1-\alpha/2} \le \frac{\hat{p}-p}{\sqrt{\hat{p}(1-\hat{p})/n}} \le z_{\alpha/2} \right) \\ &= P\left(-\hat{p} + z_{1-\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n} \le -p \le -\hat{p} + z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n} \right) \\ &= P\left(\hat{p} - z_{1-\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n} \ge p \ge \hat{p} - z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n} \right) \\ &= P\left(\hat{p} - z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n} \le p \le \hat{p} + z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n} \right) \end{align*} \] Thus, an equal tailed \(c = 1-\alpha\) confidence interval for \(p\) is \(\hat{p} \pm z_{\alpha/2} \sqrt{\hat{p}(1-\hat{p})/n}\).

Mean

For \(X_i\) iid \(E(X_i)=\mu\) and \(Var(X_i)=\sigma^2\) the CLT indicates that when \(n\) is “large”, \(\bar{X} \sim N\left(\mu, \sigma^2/n\right)\). When \(\sigma^2\) is known, and the \(X_i \sim N(\mu, \sigma^2)\) or \(n\) is large (\(n \ge 30\)), a \(c = 1-\alpha\) level CI for \(\mu\) can be computed as \[ \begin{align*} 1-\alpha & = P\left(z_{1-\alpha/2} \le \frac{\hat{\mu}-\mu}{\sigma/\sqrt{n}} \le z_{\alpha/2} \right) \\ &= P\left(-\hat{\mu} + z_{1-\alpha/2}\sigma/\sqrt{n} \le -\mu \le -\hat{\mu} + z_{\alpha/2}\sigma/\sqrt{n} \right) \\ &= P\left(\hat{\mu} - z_{1-\alpha/2}\sigma/\sqrt{n} \ge \mu \ge \hat{\mu} - z_{\alpha/2}\sigma/\sqrt{n} \right) \\ &= P\left(\hat{\mu} - z_{\alpha/2}\sigma/\sqrt{n} \le \mu \le \hat{\mu} + z_{\alpha/2}\sigma/\sqrt{n} \right) \end{align*} \] Thus, a symmetric, two-tailed \(c = 1-\alpha\) confidence interval for \(\mu\) is \(\bar{x} \pm z_{\alpha/2} \cdot \sigma/\sqrt{n}\).

Variance

Recall that for \(X_i\) iid \(N(0,1)\), \(\sum_{i=1}^n X_i^2 \sim \chi^2_n\). With a bit of hand waving (note that \(\bar{X}\) and \(s^2\) are ancillary, etc.), we see that \((n-1)s^2/\sigma^2 \sim \chi^2_{n-1}\). Thus \[ \begin{align*} 1-\alpha & = P\left(\chi^2_{1-\alpha/2} < \frac{(n-1)s^2}{\sigma^2} < \chi^2_{\alpha/2}\right) \\ & = P\left(\frac{\chi^2_{1-\alpha/2}}{(n-1)s^2} < \frac{1}{\sigma^2} < \frac{\chi^2_{\alpha/2}}{(n-1)s^2}\right) \\ & = P\left(\frac{(n-1)s^2}{\chi^2_{1-\alpha/2}} < \sigma^2 < \frac{(n-1)s^2}{\chi^2_{\alpha/2}}\right) \end{align*} \] Thus, a two-tailed, \(c=1-\alpha\) confidence interval for \(\sigma^2\) is \[ \left[\frac{(n-1)s^2}{\chi^2_{1-\alpha/2}},\frac{(n-1)s^2}{\chi^2_{\alpha/2}}\right] \]

Sample Size Estimation

Prior to collecting data one might be required to determine a sample size that will support a certain margin of error (me) at a certain confidence level. A guess for \(n\) can be made by noting that \(\mbox{me} = q^* \cdot \sigma_{\hat{\theta}}\) where \(q^*\) is some measure of confidence based on an appropriate distribution and \(\sigma_{\hat{\theta}}\) is the standard error of the estimator.

Mean

The appropriate sample size for a confidence interval for \(\mu\) when \(\sigma_X\) is known can be computed by recalling that \(\mbox{me} = z_{\alpha/2}\cdot \sigma_\bar{X}\). We need only solve for \(n\). Thus, we have \[ \mbox{me} = z_{\alpha/2} \cdot \frac{\sigma_X}{\sqrt{n}} \] implies that we should choose \(n\) at least as large as \[ n = \left(\frac{z_{\alpha/2}\cdot\sigma_X}{\mbox{me}}\right)^2 \]

Proportion

For the proportion, \(\hat{p}\), recall that \[ \mbox{me} = z_{\alpha/2} \cdot \sqrt{\frac{p(1-p)}{n}} \] Again, solving for \(n\) we get \[ n = \left(\frac{z_{\alpha/2} \cdot p(1-p)}{\mbox{me}}\right)^2 \] Unfortunately, \(p\) is unknown. Many statisticians plug in \(p=1/2\) as this maximizes \(n\). Others prefer to use \(\hat{p}\) from a prior study or they run a pilot study.